28 research outputs found
Object-Oriented Dynamics Learning through Multi-Level Abstraction
Object-based approaches for learning action-conditioned dynamics has
demonstrated promise for generalization and interpretability. However, existing
approaches suffer from structural limitations and optimization difficulties for
common environments with multiple dynamic objects. In this paper, we present a
novel self-supervised learning framework, called Multi-level Abstraction
Object-oriented Predictor (MAOP), which employs a three-level learning
architecture that enables efficient object-based dynamics learning from raw
visual observations. We also design a spatial-temporal relational reasoning
mechanism for MAOP to support instance-level dynamics learning and handle
partial observability. Our results show that MAOP significantly outperforms
previous methods in terms of sample efficiency and generalization over novel
environments for learning environment models. We also demonstrate that learned
dynamics models enable efficient planning in unseen environments, comparable to
true environment models. In addition, MAOP learns semantically and visually
interpretable disentangled representations.Comment: Accepted to the Thirthy-Fourth AAAI Conference On Artificial
Intelligence (AAAI), 202
Efficient Meta Reinforcement Learning for Preference-based Fast Adaptation
Learning new task-specific skills from a few trials is a fundamental
challenge for artificial intelligence. Meta reinforcement learning (meta-RL)
tackles this problem by learning transferable policies that support few-shot
adaptation to unseen tasks. Despite recent advances in meta-RL, most existing
methods require the access to the environmental reward function of new tasks to
infer the task objective, which is not realistic in many practical
applications. To bridge this gap, we study the problem of few-shot adaptation
in the context of human-in-the-loop reinforcement learning. We develop a
meta-RL algorithm that enables fast policy adaptation with preference-based
feedback. The agent can adapt to new tasks by querying human's preference
between behavior trajectories instead of using per-step numeric rewards. By
extending techniques from information theory, our approach can design query
sequences to maximize the information gain from human interactions while
tolerating the inherent error of non-expert human oracle. In experiments, we
extensively evaluate our method, Adaptation with Noisy OracLE (ANOLE), on a
variety of meta-RL benchmark tasks and demonstrate substantial improvement over
baseline algorithms in terms of both feedback efficiency and error tolerance.Comment: Thirty-sixth Conference on Neural Information Processing Systems
(NeurIPS 2022
Self-Organized Polynomial-Time Coordination Graphs
Coordination graph is a promising approach to model agent collaboration in
multi-agent reinforcement learning. It conducts a graph-based value
factorization and induces explicit coordination among agents to complete
complicated tasks. However, one critical challenge in this paradigm is the
complexity of greedy action selection with respect to the factorized values. It
refers to the decentralized constraint optimization problem (DCOP), which and
whose constant-ratio approximation are NP-hard problems. To bypass this
systematic hardness, this paper proposes a novel method, named Self-Organized
Polynomial-time Coordination Graphs (SOP-CG), which uses structured graph
classes to guarantee the accuracy and the computational efficiency of
collaborated action selection. SOP-CG employs dynamic graph topology to ensure
sufficient value function expressiveness. The graph selection is unified into
an end-to-end learning paradigm. In experiments, we show that our approach
learns succinct and well-adapted graph topologies, induces effective
coordination, and improves performance across a variety of cooperative
multi-agent tasks
Measurement of neutron-induced fission cross sections of
235U and 238U are very important isotopes in the nuclear energy system. Their neutron-induced fission cross sections have been measured intensively and evaluated as standard up to 200 MeV. However, as a matter of fact, the experimental data in the high-energy region are scarce. This work reports the measurement of 235, 238U(n, f) cross sections relative to n-p scattering performed at the China Spallation Neutron Source (CSNS) back-streaming neutron facility (Back-n). Preliminary results of 235, 238U(n, f) cross sections from 10 to 66 MeV are obtained, which are generally following the shape of the IAEA standard. However, significant discrepancies are observed at some given energies, which will be further studied